28 research outputs found

    Second-order Democratic Aggregation

    Full text link
    Aggregated second-order features extracted from deep convolutional networks have been shown to be effective for texture generation, fine-grained recognition, material classification, and scene understanding. In this paper, we study a class of orderless aggregation functions designed to minimize interference or equalize contributions in the context of second-order features and we show that they can be computed just as efficiently as their first-order counterparts and they have favorable properties over aggregation by summation. Another line of work has shown that matrix power normalization after aggregation can significantly improve the generalization of second-order representations. We show that matrix power normalization implicitly equalizes contributions during aggregation thus establishing a connection between matrix normalization techniques and prior work on minimizing interference. Based on the analysis we present {\gamma}-democratic aggregators that interpolate between sum ({\gamma}=1) and democratic pooling ({\gamma}=0) outperforming both on several classification tasks. Moreover, unlike power normalization, the {\gamma}-democratic aggregations can be computed in a low dimensional space by sketching that allows the use of very high-dimensional second-order features. This results in a state-of-the-art performance on several datasets

    Segmentation Based Interest Points and Evaluation of Unsupervised Image Segmentation Methods

    Get PDF

    COLTRANE: ConvolutiOnaL TRAjectory NEtwork for Deep Map Inference

    Full text link
    The process of automatic generation of a road map from GPS trajectories, called map inference, remains a challenging task to perform on a geospatial data from a variety of domains as the majority of existing studies focus on road maps in cities. Inherently, existing algorithms are not guaranteed to work on unusual geospatial sites, such as an airport tarmac, pedestrianized paths and shortcuts, or animal migration routes, etc. Moreover, deep learning has not been explored well enough for such tasks. This paper introduces COLTRANE, ConvolutiOnaL TRAjectory NEtwork, a novel deep map inference framework which operates on GPS trajectories collected in various environments. This framework includes an Iterated Trajectory Mean Shift (ITMS) module to localize road centerlines, which copes with noisy GPS data points. Convolutional Neural Network trained on our novel trajectory descriptor is then introduced into our framework to detect and accurately classify junctions for refinement of the road maps. COLTRANE yields up to 37% improvement in F1 scores over existing methods on two distinct real-world datasets: city roads and airport tarmac.Comment: BuildSys 201

    Higher-order occurrence pooling for bags-of-words: visual concept detection

    No full text
    In object recognition, the Bag-of-Words model assumes: i) extraction of local descriptors from images, ii) embedding the descriptors by a coder to a given visual vocabulary space which results in mid-level features, iii) extracting statistics from mid-level features with a pooling operator that aggregates occurrences of visual words in images into signatures, which we refer to as First-order Occurrence Pooling. This paper investigates higher-order pooling that aggregates over co-occurrences of visual words. We derive Bag-of-Words with Higher-order Occurrence Pooling based on linearisation of Minor Polynomial Kernel, and extend this model to work with various pooling operators. This approach is then effectively used for fusion of various descriptor types. Moreover, we introduce Higher-order Occurrence Pooling performed directly on local image descriptors as well as a novel pooling operator that reduces the correlation in the image signatures. Finally, First-, Second-, and Third-order Occurrence Pooling are evaluated given various coders and pooling operators on several widely used benchmarks. The proposed methods are compared to other approaches such as Fisher Vector Encoding and demonstrate improved results

    Segmentation Based Interest Points and Evaluation of Unsupervised Image Segmentation Methods.

    No full text

    Soft assignment of visual words as Linear Coordinate Coding and optimisation of its reconstruction error

    No full text
    Visual Word Uncertainty also referred to as Soft Assignment is a well established technique for representing images as histograms by flexible assignment of image descriptors to a visual vocabulary. Recently, an attention of the community dealing with the object category recognition has been drawn to Linear Coordinate Coding methods. In this work, we focus on Soft Assignment as it yields good results amidst competitive methods. We show that one can take two views on Soft Assignment: an approach derived from Gaussian Mixture Model or special case of Linear Coordinate Coding. The latter view helps us propose how to optimise smoothing factor of Soft Assignment in a way that minimises descriptor reconstruction error and maximises classification performance. In turns, this renders tedious cross-validation towards establishing this parameter unnecessary and yields it a handy technique. We demonstrate state-of-the-art performance of such optimised assignment on two image datasets and several types of descriptors

    Protected Pooling Method of Sparse Coding in Visual Classification

    No full text

    Spatial Coordinate Coding to reduce histogram representations, Dominant Angle and Colour Pyramid Match

    No full text
    Spatial Pyramid Match lies at a heart of modern object category recognition systems. Once image descriptors are expressed as histograms of visual words, they are further deployed across spatial pyramid with coarse-to-fine spatial location grids. However, such representation results in extreme histogram vectors of 200K or more elements increasing computational and memory requirements. This paper investigates alternative ways of introducing spatial information during formation of histograms. Specifically, we propose to apply spatial location information at a descriptor level and refer to it as Spatial Coordinate Coding. Alternatively, x, y, radius, or angle is used to perform semi-coding. This is achieved by adding one of the spatial components at the descriptor level whilst applying Pyramid Match to another. Lastly, we demonstrate that Pyramid Match can be applied robustly to other measurements: Dominant Angle and Colour. We demonstrate state-of-the art results on two datasets with means of Soft Assignment and Sparse Coding

    Comparison of mid-level feature coding approaches and pooling strategies in visual concept detection

    No full text
    Bag-of-Words lies at a heart of modern object category recognition systems. After descriptors are extracted from images, they are expressed as vectors representing visual word content, referred to as mid-level features. In this paper, we review a number of techniques for generating mid-level features, including two variants of Soft Assignment, Locality-constrained Linear Coding, and Sparse Coding. We also isolate the underlying properties that affect their performance. Moreover, we investigate various pooling methods that aggregate mid-level features into vectors representing images. Average pooling, Max-pooling, and a family of likelihood inspired pooling strategies are scrutinised. We demonstrate how both coding schemes and pooling methods interact with each other. We generalise the investigated pooling methods to account for the descriptor interdependence and introduce an intuitive concept of improved pooling. We also propose a coding-related improvement to increase its speed. Lastly, state-of-the-art performance in classification is demonstrated on Caltech101, Flower17, and ImageCLEF11 datasets. © 2012 Elsevier Inc. All rights reserved

    Improving few-shot learning by spatially-aware matching and crosstransformer

    No full text
    Current few-shot learning models capture visual object relations in the so-called meta-learning setting under a fixed-resolution input. However, such models have a limited generalization ability under the scale and location mismatch between objects, as only few samples from target classes are provided. Therefore, the lack of a mechanism to match the scale and location between pairs of compared images leads to the performance degradation. The importance of image contents varies across coarse-to-fine scales depending on the object and its class label, e.g., generic objects and scenes rely on their global appearance while fine-grained objects rely more on their localized visual patterns. In this paper, we study the impact of scale and location mismatch in the few-shot learning scenario, and propose a novel Spatially-aware Matching (SM) scheme to effectively perform matching across multiple scales and locations, and learn image relations by giving the highest weights to the best matching pairs. The SM is trained to activate the most related locations and scales between support and query data. We apply and evaluate SM on various few-shot learning models and backbones for comprehensive evaluations. Furthermore, we leverage an auxiliary self-supervisory discriminator to train/predict the spatial- and scale-level index of feature vectors we use. Finally, we develop a novel transformer-based pipeline to exploit self- and cross-attention in a spatially-aware matching process. Our proposed design is orthogonal to the choice of backbone and/or comparator
    corecore